40 research outputs found

    An application of distributional semantics for the analysis of the Holy Quran

    Get PDF
    In this contribution we illustrate the methodology and the results of an experiment we conducted by applying Distributional Semantics Models to the analysis of the Holy Quran. Our aim was to gather information on the potential differences in meanings that the same words might take on when used in Modern Standard Arabic w.r.t. their usage in the Quran. To do so we used the Penn Arabic Treebank as a contrastive corpu

    Ontology Learning and Semantic Annotation: a Necessary Symbiosis

    Get PDF
    Semantic annotation of text requires the dynamic merging of linguistically structured information and a ?world model?, usually represented as a domain-specific ontology. On the other hand, the process of engineering a domain-ontology through semi-automatic ontology learning system requires the availability of a considerable amount of semantically annotated documents. Facing this bootstrapping paradox requires an incremental process of annotation-acquisition-annotation, whereby domain-specific knowledge is acquired from linguistically-annotated texts and then projected back onto texts for extra linguistic information to be annotated and further knowledge layers to be extracted. The presented methodology is a first step in the direction of a full ?virtuous? circle where the semantic annotation platform and the evolving ontology interact in symbiosis. As a case study we have chosen the semantic annotation of product catalogues. We propose a hybrid approach, combining pattern matching techniques to exploit the regular structure of product descriptions in catalogues, and Natural Language Processing techniques which are resorted to analyze natural language descriptions. The semantic annotation involves the access to the ontology, semi-automatically bootstrapped with an ontology learning tool from annotated collections of catalogues

    Sharing Cultural Heritage: the Clavius on the Web Project

    Get PDF
    In the last few years the amount of manuscripts digitized and made available on the Web has been constantly increasing. However, there is still a considarable lack of results concerning both the explicitation of their content and the tools developed to make it available. The objective of the Clavius on the Web project is to develop a Web platform exposing a selection of Christophorus Clavius letters along with three different levels of analysis: linguistic, lexical and semantic. The multilayered annotation of the corpus involves a XML-TEI encoding followed by a tokenization step where each token is univocally identified through a CTS urn notation and then associated to a part-of-speech and a lemma. The text is lexically and semantically annotated on the basis of a lexicon and a domain ontology, the former structuring the most relevant terms occurring in the text and the latter representing the domain entities of interest (e.g. people, places, etc.). Moreover, each entity is connected to linked and non linked resources, including DBpedia and VIAF. Finally, the results of the three layers of analysis are gathered and shown through interactive visualization and storytelling techniques. A demo version of the integrated architecture was developed

    Leveraging a Narrative Ontology to Query a Literary Text

    Get PDF
    In this work we propose a model for the representation of the narrative of a literary text. The model is structured in an ontology and a lexicon constituting a knowledge base that can be queried by a system. This narrative ontology, as well as describing the actors, locations, situations found in the text, provides an explicit formal representation of the timeline of the story. We will focus on a specific case study, that of the representation of a selected portion of Homer\u27s Odyssey, in particular of the knowledge required to answer a selection of salient queries, formulated by a literary scholar. This work is being carried out within the framework of the Semantic Web by adopting models and standards such as RDF, OWL, SPARQL, and lemon among others

    La Modellazione Diacronica di Risorse Termino-Ontologiche nell'Ambito delle Digital Humanities: Esperimenti su Clavius

    Get PDF
    Abstract English. In this work, we present an experiment in the modeling of a diachronic termino-ontological resource named CLAVIUS through both the N-ary relations model and the 4D-fluents approach. Some of the salient differences of these two models are discussed. The overall objective of this research is to illustrate the main advantages and disadvantages in the adoption of a given model to build diachronic resources. Italiano. In questo lavoro, si illustra un esperimento di modellazione di una risorsa termino-ontologica diacronica (CLAVIUS) secondo due approcci, quello N-ario e quello dei 4D-fluents. Le differenze salienti dei due approcci verranno presentate e discusse. L'obiettivo generale della ricerca qui introdotta è quello di mostrare i principali vantaggi e svantaggi che l'adozione di un determinato modello può comportare nella modellazione di risorse diacroniche. Introduzione Pànta rei è la celebre espressione attribuita da Platone ad Eraclito. Tutto è sottoposto alla inesorabile legge del mutamento: la realtà, le categorie attraverso le quali la organizziamo e le parole che usiamo per parlare di essa. Quali sono gli strumenti a disposizione dell'umanista digitale di oggi, che si trovi a dover rappresentare in modo esplicito e formale tale evoluzione diacronica dei concetti e dei termini in un determinato ambito, in modo che tale formalizzazione sia computabile ad un calcolatore? In questi ultimi anni, ed in particolar modo nell'ambito delle Digital Humanities, si è sottolineata l'importanza di operare con tecnologie che siano alla base del Semantic Web e dei Linked Open Data per garantire interoperabilità e riuso delle risorse all'interno della comunità scientifica In questa ottica, le ontologie -e l'OWL, il loro linguaggio di rappresentazione standard -giocano un ruolo fondamentale. Tuttavia, il carattere fondamentalmente statico di questi ultimi e la necessità di modellare aspetti di evoluzione temporale sembrano a prima vista inconciliabili. Le riflessioni che presentiamo in questo articolo nascono dalle esperienze condotte in seno al Progetto Clavius on the Web 1 . Tra gli obiettivi del Progetto, infatti, vi è anche quello di creare una risorsa termino-ontologica (RTO) che rappresenti l'evoluzione delle teorie matematicoastronomiche dall'antichità al XVI -XVII secolo, così come viene descritta da Clavius nei suoi Euclidis Elementorum Libri XV. Accessit XVI e In sphaeram Ioannis de Sacro Bosco Commentarius. Il Contesto Come sottolineato nell'Introduzione, il linguaggio OWL (e la sua estensione OWL2) è lo standard W3C per la creazione e condivisione di ontologie nel Semantic Web. In particolare, OWL DL implementa la logica descrittiva SHOIN (D n ), che garantisce una maggiore espressività rispetto a RDF e RDFS, senza compromettere la decidibilità e il meccanismo inferenziale. Tuttavia, OWL è un linguaggio statico; in esso le proprietà e le relazioni tra entità sono fondamentalmente binarie, espresse sotto forma di triple <Subject predicate Object>. Tale restrizione sintattica rende più complessa la rappresentazione 1 http://claviusontheweb.it (ultimo accesso: 13/10/2016

    Il Sistema Traduco nel Progetto Traduzione del Talmud Babilonese

    Get PDF
    Nell’ambito del Progetto Traduzione del Talmud Babilonese, l’Istituto di Linguistica Computazionale del CNR ha sviluppato Traduco, uno strumento web collaborativo con alcune caratteristiche che lo rendono particolarmente adatto alla traduzione di testi che pongono problemi interpretativi. Ad oggi, gli strumenti per la traduzione assistita (in inglese, Computer-Assisted Translation, o CAT) sono utilizzati tipicamente per la traduzione di manuali tecnici, testi legislativi o siti Web e hanno principalmente lo scopo di accelerare il processo di traduzione. Traduco riprende la maggior parte dei componenti standard di uno strumento di traduzione assistita tradizionale, ma li estende con caratteristiche specifiche necessarie per supportare l’interpretazione e la traduzione di testi complessi che pongono particolari problemi di comprensione. In questo articolo presenteremo un caso di studio specifico, relativo a un testo con queste caratteristiche: il Talmud Babilonese. Traduco include funzionalità per l’aggiunta di note, riferimenti bibliografici, annotazioni semantiche e creazione di glossari. Traduttori, revisori, redattori, supervisori e utenti finali che accedono al Sistema sono supportati nell’intero processo di traduzione, che va dall’interpretazione del testo originario alla fase editoriale per la stampa delle traduzioni, attraverso l’uso di tecnologie di traduzione assistita, l’annotazione semantica del testo, l’arricchimento delle traduzioni con informazioni esplicative, l’esportazione delle traduzioni in XML e in TEI e l’integrazione di tecniche per il trattamento automatico della lingua. La progettazione e lo sviluppo di Traduco ha richiesto l’adozione di un approccio multidisciplinare che combina aspetti di ingegneria del software, linguistica computazionale, ingegneria della conoscenza ed editoria digitale

    Chapter Rappresentazione, costruzione e visualizzazione di risorse terminologiche diacroniche nell’era del web semantico

    Get PDF
    This article introduces the model DIATERM, devoted to representing the diachronic evolution of concepts and terms in a given domain, according to Semantic Web standards and Linked Data technologies. The approach adopted for the representation of temporal information is based on the reification of N-ary relationships. DIATERM is articulated on three levels, textual, terminological and conceptual. Each level can be affected, more or less simultaneously, by change. The use of SWRL rules allows to automatically assign temporal information, thus facilitating the construction of the terminological resource and highlighting any inconsistencies. Two examples of interrogation and visualization of diachronic terminological resources will be illustrated. The first example is taken from the resource dedicated to the astronomical terminology introduced by Christopher Clavius in his Commentary on the Sacrobosco’s Tractatus de Sphaera. The second example is taken from the electronic lexicon of Ferdinand de Saussure's linguistic terminology

    When Translation Requires Interpretation: Collaborative Computer--Assisted Translation of Ancient Texts

    Get PDF
    This paper introduces the main features of Traduco, a Web-based, collaborative Computer-Assisted Translation (CAT) tool developed to support the translation of ancient texts. In addition to the standard components offered by traditional CAT tools, Traduco includes a number of features designed to ease the translation of ancient texts, such as the Babylonian Talmud, posing specific structural, stylistic, linguistic and hermeneutical challenges

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    A hybrid approach for semantic relation extraction in ontology learning from text

    No full text
    In this thesis we propose an unsupervised system for semantic relation extraction from texts. The automatic extraction of semantic relationships is crucial both in ontology learning from text and for semantic annotation and represents a solution to the "knowledge acquisition bottleneck" in the context of the Semantic Web. The developed system, assessed on English and Italian language but applicable to any other languages, takes as input pairs of words and determines whether there is a semantic relationship between these words. The initial pairs of terms are extracted from a "Target Corpus" by an unsupervised statistical system in charge of determining if two terms can be considered "distributionally similar", on the assumption of distributional semantics that "the meaning of a word is strongly related to the contexts in which it appears." To verify that there is actually a semantic relation between two terms and determine its nature, the system searches for words on a "Support Corpus" (the Web) in the context of lexico-syntactic "reliable" (low "recall" but "high precision") patterns, where these words appear in the same sentence (as, for example, the words "steer" and "car" in the phrase "the steer is part of the car"). This thesis describes the overall process that led to the development of the RelEx system, starting from the definition and application of the lexico-syntactic patterns, and including the measures used to assess the reliability of specific semantic relations that the system suggests. The work focuses on the semantic relations of hyponymy ("is_a"), meronymy ("part_of") and co-hyponymy (i.e. two terms are hyponyms of the same term, as "lion" and "tiger" with respect to "feline"). The approach may however be extended to extract other relationships by changing the battery of reliable patterns used. The precision of the system was evaluated as 83.3% for hyponymy, 75% for meronymy and 72.2% for co-hyponymy, demonstrating the validity of the proposed approach. In this work, in addition to the novel concepts of "Closed Pattern" and "Open Pattern", two new technologies are described. The first methodology, called "trans-language boosting" is devoted to the application of reliable patterns and pairs of terms expressed in different languages with the aim of increasing the performance of the system. The second technique, defined as "cross-reference near-synonymy extraction", is based on the application of "open" patterns for the recognition of near-synonymy relations
    corecore